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1. Introduction 
There is a great difference in the electricity consumption 
2 = == patterns of different types of users, such as domestic, commercial, 
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Nomenclature 


membership degree matrix 

cluster center matrix 

number of data object 

number of clusters 

membership degree of x; to v; 

sum of squared errors of all sample objects 
fuzziness parameter in FCM 

ith cluster 


osmaga SG 


Xj jth data object 

Vi cluster center of C; 

d(x;v;) Euclidean distance of x; to v; d(x;,V;)=Ilx;-vjll? 
Jm(U,V) objective function of FCM 

Subscripts 

i ith cluster 

j jth data object 

k kth cluster 


planning, the making of competitive market policies and the 
provision of more personalized electric power services for electri- 
city producers [1,2], but also help different electricity users to 
enhance the understanding of their electricity consumption pat- 
terns. Moreover, users can adjust their electricity consumption 
strategies more economically and optimally based on the knowl- 
edge discovered from load classification. Hence, the electricity 
consumption costs will be reduced and the energy use efficiency 
will be improved more significantly [3]. 

Load classification is to partition various load patterns into 
groups so that load patterns in the same group are more similar to 
each other than to load patterns in other groups based on various 
clustering algorithms [4,5]. The characteristic load pattern is used 
to represent and describe the load patterns in the same group. 
Load classification is an important part of load modeling, there- 
fore, the accuracy of load classification can directly affect the 
reasonableness and effectiveness of load modeling [6]. 

Load classification is a process which consists of many steps. 
Such as load classification preparation, load classification imple- 
mentation using clustering method, as well as the understanding 
and applications of load classification. The process model and 
specific steps of load classification are presented in Section 2. 

While in smart grid environment [7], a large amount of load 
data will be measured and collected by advanced load measuring 
equipment. The scale of the load data collected will be larger, and 
the structure will be more complex. Moreover, the form of load 
data in smart grid environment will be more flexible. Therefore, 
mining and extracting valuable knowledge from the massive 
electric power load data in smart grid environment is an impor- 
tant research direction. 

Based on the summation and analysis of existing research 
about electric power load classification, the five-stage process 
model of load classification in smart grid environment is estab- 
lished in Section 2. The commonly used clustering methods and 
result evaluation methods of load classification are reviewed and 
summarized in Section 3. Section 4 presents the applications of 
load classification, including bad data identification and correc- 
tion, load forecasting, and tariff setting, etc. Section 5 gives an 
example of load classification based on Fuzzy c-means (FCM) 
algorithm [8]. Finally, conclusions are made in Section 6. 


2. Process model of load classification 


The electric power load data in smart grid is big. Specifically, its 
scale is large, its structure is complex and heterogeneous, its 
dimension is high, and its form is real-time and dynamic. These 
characteristics make the load classification in smart grid even 
more difficult. Hence, a definite model of load classification is 
necessary. 

As it is shown in Fig. 1, there are five stages in the process of 
load classification, namely, load data preparation, load data 


clustering preparation, load data clustering implementation, 
understanding and evaluation of load classification results, and 
applications of load classification results. 

The preparation of input data for load classification is the first 
step. According to the dimensions of time, regions and the types of 
substations, the power load conditions are determined first. Then 
selecting sample data from massive load data using sampling 
methods. Afterwards, the input sample load data selected are 
normalized, and the outlier and noise data should also be identified 
and corrected. 

Load data clustering preparation stage includes determining 
the classification characteristics, choosing appropriate clustering 
algorithm and determining its corresponding parameters. Three 
kinds of load characteristic indices, descriptive, comparative and 
curved, are summarized in [9]. The well-known clustering algo- 
rithms used for load classification are K-means, FCM, hierarchical 
clustering method, etc. The parameters in clustering algorithm are, 
for example, the initial cluster centers, number of clusters, and 
fuzziness parameter in FCM, etc. 

The third step is to implement clustering algorithm based on 
the pre-processed load data, selected classification characteristics 
and clustering algorithm and its corresponding parameters. 

After the load data clustering, we need to understand and 
evaluate the classification results. The classification results are 
generally presented as a certain number of groups of load patterns 
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e Bad data identification and correction 
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clustering e Select clustering method 
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Fig. 1. Process model of load classification. 
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Fig. 2. Clustering methods can be used for load classification. 


Table 1 


Studies about four commonly used clustering methods for load classification. 


Clustering Short description Pros and cons References 

algorithm 

K-means A classical partitioning crisp clustering Simple, efficient and scalable; difficult to determine initial cluster centers and cluster 15-18] 
method numbers, sensitive to noise and outliers, can only be used to spherical data, etc. 

FCM A well-known local search fuzzy clustering Membership degree of fuzzy partitions is introduced; difficult to determine initial cluster 19-24] 
method, also a partitioning method centers and cluster numbers, easy to fall into local optimum, etc. 

Hierarchical The bottom-up aggregation or top-down Easy to implement; difficult to select the agglomerate and split points, etc. 25-28] 

clustering spit of groups 

method 

SOM A kind of unsupervised neural networks Can identify the most significant characteristics with self-stability, has a strong ability of anti- [29-32] 


method 


noise; The learning efficiency depends on the input order of sample objects when the number 


of objects is small. Be affected by factors such as the weights of network connection, the 
adjustment of learning efficiency, the selection of neighborhood function, etc. 


and their corresponding representative load patterns. The infor- 
mation and characteristics of each group of load patterns need to 
be described and understood. In addition, cluster validity indices 
are generally used to validate the quality of clustering results. 

The ultimate goal of load classification process is to support the 
decision-making of power systems participants. Based on the 
knowledge and information discovered from load classification, 
the demand-side management can be implemented. Also, it can 
improve the practicality of bad data identification and correction, 
the accuracy of load forecasting, and the appropriateness of tariff 
setting. 


3. Clustering methods and result evaluation methods for load 
classification 


3.1. Clustering methods for load classification 


Clustering methods [10,11] can be grouped into five categories 
based on the clustering criterion, and each category contains many 
specific clustering methods. Such as partitioning methods include 
K-means, FCM, PAM, etc. All of the clustering methods can be used 
for load classification, which are summarized in Fig. 2. 

We should note that no one clustering method is always 
superior to the others when they are used for load classification, 
as they are used for other applications. Some methods are more 
commonly used for load classification than the others since they 
are easier to operate or better results can be obtained by them. 

We will give a brief introduction to the four commonly used 
clustering methods for load classification, K-means [12], FCM [8], 
hierarchical clustering method [13] and self-organization mapping 


(SOM) [14], from Sections 3.1.1-3.1.4. The four methods and 
corresponding references are summarized in Table 1. 

In addition to the above four commonly used methods, some 
new methods, such as Support Vector Clustering [33], FaiNet [34], 
honey bee mating optimization [35], ant colony optimization 
algorithm [36], fellow the leader [37], iterative refinement cluster- 
ing [38], and ISODATA [39], etc. have also been studied and used 
for load classification. 

Although there are some differences in the configuration of 
platforms, software, and hardware when different clustering meth- 
ods are used for load classification. The key requirements, such as 
load data measuring and collection platform AMR (Automated 
Meter Reading), the computing software MATLAB, SPSS or R, and 
the high-performance computers, are all needed for the implemen- 
tation of load classification. 


3.1.1. K-means algorithm 

K-means algorithm [12] is a kind of classical crisp clustering 
method used for load classification. The basic idea of K-means is that 
selecting c initial cluster centers randomly once the number of 
clusters c is determined, then allocate other objects to their nearest 
cluster according the distance between the object and the cluster 
centers. Performing iterative operations until the criterion function 
shown in Eq. (1) converges to a certain range. 


c 


E= > È d(x%j,vi) (1) 


i=1xEC; 


The operation of K-means is simple, efficient and scalable. 
Hence, it is the commonly used crisp clustering methods for load 
classification. However, the deficiencies of K-means include: (1) 
the selection of initial cluster centers can significantly affect the 
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Table 2 
Well-known CVIs for the evaluation of load classification results. 


Type CVI References 
Crisp clustering [43] 
=mi i DC.G) eye Ti 5(C,) = 
Dunn =, min. pots ( PEH) , where D(C;,C;) = in BEY = mazdy) 
j#i 
6*3 variants of Dunn based on different D(C;,C;) and 6(C;) [44] 
C= Seu where O= YY (Xi), Smin(C) = VMN Mwy, xex 1i,X;), Smax(C) = LMAX (Nw), x ex (Xi) [45] 
Cp ECX;, Xj ECK 
S+)-(S- 
Gamma = § 7) [46] 
( ) [47] 
Ef (4 E maxday.x)) 
cs= Tec Ci 
Di 1 ( maxd(vinve) 
1E j [48] 
I= (: x pix max dvv) , where E, = Xf; È d(xj,vi) 
ij=1,-4¢ xjEC; 
SF = 1-a Where bed(C) = ZAY, wea) = Ff 41/1; E dev) [49] 
XjEC 
1/ni Dx; ec; U%j.Vi) 50 
COP IEF- Ni min, ec, = Zc aun ee 
Fuzzy clustering PC= Df 1 Df aie PE= -i Ei Df logan [51] 
FS = Xf DP 1# dQ%j,Vi)—d(vi,X)) [52] 
Den DIL endow) 53 
XB= Sata [33] 
ij 
54] 
c( Df DP We jan [ 
NFI = (ziz vi) 
Ep pde) 55 
Sc=- TS EN] [55] 
PBMF = (2 E De ), where E; = X?_ ,,d(x).v;), De = max d(vi,vj [36] 
= EX SE raw * c |}, where i= Dj hii (X},Vi), De = max (vivi) 
PCAES = Xf 1 D? u /um- Ef- 1exp(—min(d(vi.VK)/Br)) where yy = min}? 1 n, [57] 
Br = tE dvi) 
CO = C(c,U)-0(c,U), CO, = 15 [58] 
algorithm, (2) determining the appropriate number of clusters is Èj- eX r 
difficult, (3) it is sensitive to noise and outliers data, (4) it can only w= Le (4) 
be used to find groups in spherical data set, etc. Therefore, the 
k-means algorithm used in load classification is usually modified where pi satisfies 
or optimized [15-18]. 
P ere 410, 1] © 
c 
3.1.2. FCM È “j= 1Vj=1,...0 (6) 
FCM [8] is a well-known local search fuzzy clustering method. A i=1 
data object in a data set belongs and only belongs to one group in 
crisp clustering. While in the fuzzy clustering, each data object 0< ¥ py<n,vi=1,...c (7) 
cannot be strictly clustered into a certain group, but into more than j=l 


one groups with a certain degree of membership to each group. 

FCM algorithm starts with determining the number of clusters 
followed by guessing the initial cluster centers. Then every data 
object is assigned a membership degree to each cluster. Each 
cluster center point and corresponding membership degree are 
updated iteratively by minimizing the objective functions until the 
positions of the cluster centers does not change or the difference 
of objective function values between two iterations ranges in a 
permitted extent. 

The objective function of FCM algorithm is defined as 


3 n 
ImUV)= YY wij deji) (2) 
i=1j=1 
The iterative procedure updates membership jj; and the cluster 
centers vi by 


1 
~ YE (d%j,vp/dey,vy/™ 


Hij (3) 


The concept of membership degree of fuzzy partitions is 
introduced to FCM, and it has become a popular fuzzy clustering 
method for load classification [19-24]. However, FCM is also 
sensitive to the initial cluster centers, and the cluster number is 
also difficult to be determined. Moreover, it is easy to fall into local 
optimum. All of these factors can affect the accuracy and effec- 
tiveness of load classification. The research of FCM optimized by 
intelligent algorithm is an interesting direction [24]. 


3.1.3. Hierarchical clustering method 

The main idea of hierarchical method [13] is the bottom-up 
aggregation or top-down spit of groups in a data set until the 
satisfied classification result is formed. Each object in the data set 
is regarded as a group, and then form a larger one by merging two 
groups based on a certain criterion (generally the distances among 
clusters) until all the objects are in a single cluster or meeting a 
termination condition in the agglomerate-type hierarchical 
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clustering. The steps of split-type hierarchical clustering are just 
opposite to the agglomerate-type. 

The steps of hierarchical method are easy to implement, 
thereby it is widely used for load classification. However, the 
selection of agglomerate and split points is difficult, since the 
clustering operation is based on the former steps, and the former 
steps cannot be changed, the final clustering result is directly 
affected by the merging or split operation in each step. Hence, the 
hierarchical clustering method is also improved and modified 
when used for load classification [25-28]. 


3.1.4. Self-organization mapping (SOM) 

SOM [14] is a kind of unsupervised neural networks method, 
also known as Kohonen neural network. SOM is composed of input 
layer and competitive layer. There are N input neurons in the input 
layer and M output neurons in the output layer. The neurons in 
input layer and output layer are interconnected, and the winning 
neuron is the nearest one in output layer to N input neurons. 

Evaluation function is not needed in SOM, and it can identify 
the most significant characteristics with self-stability. SOM also 
has a strong ability of anti-noise. All of these make SOM being 
widely used for load classification [29-32]. However, the learning 
efficiency depends on the input order of sample objects when the 
number of objects is small. The factors such as the weights of 
network connection, the adjustment of learning efficiency, the 
selection of neighborhood function, can significantly affect the 
performance of SOM, thereby affect the effectiveness of load 
classification. 


3.2. Evaluation methods for load classification results 


Since clustering is an unsupervised process, the load data 
objects in data sets are unlabeled and no structural knowledge 
about the data set is available. Hence, measuring the quality of 
clustering results and determining the optimal number of clusters 
are difficult tasks. The most commonly used approach to deter- 
mine the optimal number of cluster is to execute the clustering 
algorithm several times with different number of clusters and then 
selecting the number of clusters that provides the best result 
observing a predefined criterion function. The predefined criterion 
function is called cluster validity index (CVI). When the number of 
clusters and other parameters of clustering algorithm are fixed, 
CVI can be used to evaluate and validate the results of load 
classification. Currently, a large number of CVIs have been pro- 
posed and reviewed [40-42]. Previous studies on CVIs have 
demonstrated that there is no single CVI that can deal with any 
data sets and always perform better than the others. But they are 
consistent on the basic principle that a good partition should have 
a small intra-cluster variance and a large inter-cluster separation 
at the same time. Here, we review some well-known CVIs which 
can be used for the evaluation of load classification results and 
present a brief summary shown in Table 2. 


4. Applications of load classification 


4.1. Bad data identification and correction based on load 
classification 


The bad data existing in the load data set can affect the correct 
decision-making of power producers, and even affect the daily 
running and the safety of power systems [59]. In smart grid environ- 
ment, power producers and managers must accurately identify and 
appropriately process the bad load data effectively. 

Many studies have focused on bad data identification and 
correction based on load classification. Zhang et al. [60] presented 


an intelligent cleaning model for bad data based on load classifica- 
tion using Kohonen neutral network optimized by fuzzy soft 
clustering. While Wang et al. [61] identified the bad data effec- 
tively using K-means clustering algorithm based on cluster validity 
index, thereby reducing the undetected and false detected bad 
data. Similarly, the bad data in transmission grid state estimation 
were detected, identified and corrected by K-means algorithm 
combining validity index in [62]. Additionally, Jiang et al. [63] 
identified the bad data according to the good data classification 
obtain by fuzzy equivalent matrix clustering. 

Existing studies have demonstrated that the effectiveness and 
practicality of bad data identification and correction can be 
improved by load classification. The results obtained by load 
classification are the input of bad data identification and correc- 
tion, and it is an important influencing factor. 


4.2. Load forecasting based on load classification 


Load forecasting is a hot research direction in demand-side 
management of power systems, especially in smart grid environ- 
ment, and various load forecasting methods have been proposed 
[64]. 

Load classification can also be used for load forecasting, and the 
accuracy of load forecasting can be improved supported by load 
classification. Misiti et al. [65] grouped the global electric power 
information based on clustering methods, and then obtained the 
overall forecasting results by combining the decomposed forecast- 
ing information. While Li and Han [66] presented a load forecast- 
ing method based on ant colony clustering, which can improve the 
accuracy of load forecasting. Also, Jota et al. [28] gave a load 
forecasting method of daily load curves and the peak load based 
on the typical daily curve and the corresponding dynamic load 
model obtained by hierarchical clustering. 

Load classification can be used to support the accurate forecasting 
of load in many ways. Such as forecasting the total load based on the 
load of different types of users obtained from load classification. In 
addition, each type of consumers' typical load profile or characteristic 
load can be used as the input data of load forecasting. 


4.3. Tariff setting based on load classification 


Many countries are taking deregulation and open marketing 
polices of electricity market in smart grid environment. These 
polices can promote the competitions in electricity market, 
improve the efficiency of investments and power systems opera- 
tion, and reduce costs. In China, the tariff reform has become the 


Fig. 3. 72 load profiles. 


108 


core of the power systems reform. Developing differentiated and 
personalized tariff according to the load constitution of distribu- 
tion network and the consumption patterns of different users is an 
important and significant research area [67]. 

Mahmoudi et al. [68] pointed out the knowledge of how and when 
consumers use electricity is essential to the retailer in competitive 
environment, and proposed an annual framework for optimal price 
offering by a retailer based on the clustering and classification of load 
profiles of consumers. While Chicco et al. [69] analyzed the tariff 
setting and costs of power distribution companies based on the 
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classification of consumers' load profiles. Huang et al. [70] presented 
a tariff decision-making model by considering load classification and 
electricity using characteristics from load rate, the power supply 
voltage level, the load shape and the reliability requirements, etc. 
Also, Ozveren et al. [71] proposed a method for the automatic 
classification of large-sets of electrical demand profiles using fuzzy 
relation, and the classification results can be used by Supply Business 
for tariff development and end user costing. 

The tariff setting is a complex process and various techniques, 
such as optimization, decision-making, and economics, are required. 


a 0.5 R Aà 0.5 
0 0 0 
5 10 15 20 5 10 15 20 5 10 15 20 
t/h t/h t/h 
Fig. 4. Load profiles in each group. 
1 1 1 
A 0.5 7 © O.5+ a 0.5 + 
0 0 0 
10 15 20 > 10 15 20 5 10 15 20 
t/h t/h t/h 
1 l I — 
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0 0 0 
10 15 20 5 10 15 20 5 10 15 20 
t/h t/h t/h 


Fig. 5. Characteristic load profile of each group. 
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Load classification is an important decision support tool for tariff 
setting. 


5. An example of load classification based on FCM 


In order to illustrate the process of load classification, we present 
an example based on FCM algorithm. The data used are 72 load 
profiles of 6 types of different electricity consumers in a city of 
China [72], each load profile is a daily load profile measured every 
one hour. The load profiles are shown in Fig. 3. 

The parameters in FCM are set as follows. The number of 
clusters c=6, the fuzziness parameter m = 2.5, the initial cluster 
centers are selected randomly, and the algorithm is implement 50 
times and the average values are selected as the result. According 
to the process model described in Section 2, the load classification 
results are shown in Figs. 4 and 5. 

As Figs. 4 and 5 show, the shape of load profiles, which indicate 
the pattern of electricity consumption of different users, are 
different. For example, the range of load profiles in the second 
group is about 0.2-1.0, which is a larger range. While the entire 
load values in the sixth group are high, with smaller range. Also, 
different from the other five groups of load profiles, there are two 
peaks in the fifth group, one of which appears in the night. 
Additionally, more information and knowledge can be discovered 
from the results of load classification. Based on these, the decision- 
making and policies development can be more effective and 
efficient for both electricity producers and consumers. 


6. Conclusion 


With the in-depth theoretical study and widespread application 
of smart grid, load classification will play an increasingly important 
role in decision-making of power systems and service provision of 
electricity market. Load classification methods are the premise and 
basis of load classification, and the analysis and applications are the 
ultimate goal of load classification. The difficulties and research 
directions of load classification in smart grid are as follows. 


(1) The influence of the complex smart grid environment to load 
classification. The load data in smart grid environment are 
massive, dynamic, high-dimensional, and heterogeneous. All 
these characteristics increase the difficulty of load classifica- 
tion in each process. Such as the efficient update of character- 
istic load patterns with the adding and deleting of consumers. 

(2) The study of efficient and effective load classification methods. 
Traditional clustering algorithms, such as K-means, FCM and 
hierarchical method, are widely used, but the deficiencies of 
these methods have been demonstrated, which can signifi- 
cantly affect the effectiveness of load classification. Moreover, 
most traditional clustering methods are inefficient in dealing 
with the big load data in smart grid. Hence, more efficient new 
methods and the optimized traditional methods should be 
developed. While evaluating and validating the load classifica- 
tion results, we should not only consider the values of CVIs, 
but also the characteristics of load data and the purpose of 
load classification. 

(3) The study of before-classification preparation and after- 
classification analysis. In addition to the bad data identification 
and processing, data normalization and the selection of load 
classification characteristics, data sampling and reduction meth- 
ods are also important research contents in the preparation 
process of load classification. While in the after-classification 
process, more effective and efficient methods about the 


evaluation, understanding and analysis of classification results 
need to be studied. 

(4) The expansion study of load classification in demand-side 
management [73]. Besides bad data identification and correc- 
tion, load forecasting and tariff setting based on load classifi- 
cation, there are more applications based on the results of load 
classification, which are interesting research directions. 
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